This is an example of inserting an image and resizing it in the file.

Example of using SQLDF to Query

These are a few examples of running queries and how to select certain sets of data and also choosing which directories to search the data from, and then run an sqldf to run the results and display the query. Another technique used in these is sorting the data we display using the WHERE statement to make specific requirements for the data to meet in order to be displayed

query<-"SELECT playerID,yearID,teamID,HR FROM Batting
WHERE teamID='PHI' and yearID>=1970 and yearID<=1979 and HR>= 30"
sqldf(query)
##     playerID yearID teamID HR
## 1  johnsde01   1971    PHI 34
## 2  montawi01   1971    PHI 30
## 3  schmimi01   1974    PHI 36
## 4  luzingr01   1975    PHI 34
## 5  schmimi01   1975    PHI 38
## 6  schmimi01   1976    PHI 38
## 7  luzingr01   1977    PHI 39
## 8  schmimi01   1977    PHI 38
## 9  luzingr01   1978    PHI 35
## 10 schmimi01   1979    PHI 45
query<-"SELECT playerID,yearID,teamID,HR FROM Batting
WHERE teamID='NYA' and HR> 40"
sqldf(query)
##     playerID yearID teamID HR
## 1   ruthba01   1920    NYA 54
## 2   ruthba01   1921    NYA 59
## 3   ruthba01   1923    NYA 41
## 4   ruthba01   1924    NYA 46
## 5   ruthba01   1926    NYA 47
## 6  gehrilo01   1927    NYA 47
## 7   ruthba01   1927    NYA 60
## 8   ruthba01   1928    NYA 54
## 9   ruthba01   1929    NYA 46
## 10 gehrilo01   1930    NYA 41
## 11  ruthba01   1930    NYA 49
## 12 gehrilo01   1931    NYA 46
## 13  ruthba01   1931    NYA 46
## 14  ruthba01   1932    NYA 41
## 15 gehrilo01   1934    NYA 49
## 16 gehrilo01   1936    NYA 49
## 17 dimagjo01   1937    NYA 46
## 18 mantlmi01   1956    NYA 52
## 19 mantlmi01   1958    NYA 42
## 20 mantlmi01   1961    NYA 54
## 21 marisro01   1961    NYA 61
## 22 jacksre01   1980    NYA 41
## 23 martiti02   1997    NYA 44
## 24 giambja01   2002    NYA 41
## 25 giambja01   2003    NYA 41
## 26 rodrial01   2005    NYA 48
## 27 rodrial01   2007    NYA 54
## 28 grandcu01   2011    NYA 41
## 29 grandcu01   2012    NYA 43
query<-"SELECT playerID,yearID,teamID,HR,SO FROM Batting
WHERE HR> 40 and SO<=60"
sqldf(query)
##     playerID yearID teamID HR SO
## 1  hornsro01   1922    SLN 42 50
## 2  willicy01   1923    PHI 41 57
## 3    ottme01   1929    NY1 42 38
## 4   ruthba01   1929    NYA 46 60
## 5  gehrilo01   1931    NYA 46 56
## 6   ruthba01   1931    NYA 46 51
## 7  gehrilo01   1934    NYA 49 31
## 8  gehrilo01   1936    NYA 49 46
## 9  troskha01   1936    CLE 42 58
## 10 dimagjo01   1937    NYA 46 37
## 11  mizejo01   1940    SLN 43 49
## 12  mizejo01   1947    NY1 51 42
## 13 willite01   1949    BOS 43 48
## 14 kinerra01   1951    PIT 42 57
## 15 camparo01   1953    BRO 41 58
## 16 rosenal01   1953    CLE 43 48
## 17 kluszte01   1954    CIN 49 35
## 18  mayswi01   1954    NY1 41 57
## 19 kluszte01   1955    CIN 47 40
## 20  mayswi01   1955    NY1 51 60
## 21 aaronha01   1957    ML1 44 58
## 22 sievero01   1957    WS1 42 55
## 23 aaronha01   1969    ATL 44 47
## 24 aaronha01   1971    ATL 47 58
## 25 thomafr04   1993    CHA 41 54
## 26 bondsba01   2002    SFN 46 47
## 27 bondsba01   2003    SFN 45 58
## 28 bondsba01   2004    SFN 45 41
## 29 pujolal01   2004    SLN 46 52
## 30 pujolal01   2006    SLN 49 50

How to Order by Descending

This chunk of code is an example that shows a way to order the final results In a descending order, as opposed to the default asceding order it would return the data in.

query<-"SELECT playerID,teamID,yearID,HR FROM Batting
WHERE HR>50
ORDER BY HR DESC"
sqldf(query)
##     playerID teamID yearID HR
## 1  bondsba01    SFN   2001 73
## 2  mcgwima01    SLN   1998 70
## 3   sosasa01    CHN   1998 66
## 4  mcgwima01    SLN   1999 65
## 5   sosasa01    CHN   2001 64
## 6   sosasa01    CHN   1999 63
## 7  marisro01    NYA   1961 61
## 8   ruthba01    NYA   1927 60
## 9   ruthba01    NYA   1921 59
## 10  foxxji01    PHA   1932 58
## 11 greenha01    DET   1938 58
## 12 howarry01    PHI   2006 58
## 13 gonzalu01    ARI   2001 57
## 14 rodrial01    TEX   2002 57
## 15 wilsoha01    CHN   1930 56
## 16 griffke02    SEA   1997 56
## 17 griffke02    SEA   1998 56
## 18  ruthba01    NYA   1920 54
## 19  ruthba01    NYA   1928 54
## 20 kinerra01    PIT   1949 54
## 21 mantlmi01    NYA   1961 54
## 22 ortizda01    BOS   2006 54
## 23 rodrial01    NYA   2007 54
## 24 bautijo02    TOR   2010 54
## 25 davisch02    BAL   2013 53
## 26 mantlmi01    NYA   1956 52
## 27  mayswi01    SFN   1965 52
## 28 fostege01    CIN   1977 52
## 29 mcgwima01    OAK   1996 52
## 30 rodrial01    TEX   2001 52
## 31 thomeji01    CLE   2002 52
## 32 kinerra01    PIT   1947 51
## 33  mizejo01    NY1   1947 51
## 34  mayswi01    NY1   1955 51
## 35 fieldce01    DET   1990 51
## 36 jonesan01    ATL   2005 51

Searching with more criteria

This query is an example where we searched for the players who had the fewest Strike outs in a single season, but we used another criteria that they had to have also had atleast 400 at bats in the season as well.

# This finds and sorts who has had the fewest strikeouts in a season with atleast 400 at bats
query<-"SELECT teamID, playerID, yearID, SO, AB FROM Batting
WHERE AB>= 400 and SO< 10
ORDER BY SO"
sqldf(query)
##    teamID  playerID yearID SO  AB
## 1     NY1 doyleja01   1894  3 422
## 2     NYA seweljo01   1932  3 503
## 3     CLE seweljo01   1925  4 608
## 4     CLE seweljo01   1929  4 578
## 5     NYA seweljo01   1933  4 524
## 6     NY1  wardjo01   1893  5 588
## 7     CHN holloch01   1922  5 592
## 8     CLE mcinnst01   1922  5 537
## 9     PIT wanerll01   1936  5 414
## 10    BS1 wrighge01   1875  6 408
## 11    BSN broutda01   1889  6 485
## 12    BLN keelewi01   1894  6 590
## 13    NY1  wardjo01   1894  6 540
## 14    SLN quinnjo02   1895  6 543
## 15    BSN mcinnst01   1924  6 581
## 16    CLE seweljo01   1926  6 578
## 17    NY1  wardjo01   1889  7 479
## 18    PHI crossla01   1893  7 415
## 19    SLN quinnjo02   1893  7 547
## 20    PHI crossla01   1894  7 529
## 21    CIN vaughfa01   1896  7 433
## 22    PHA cochrmi01   1927  7 432
## 23    CLE seweljo01   1927  7 569
## 24    PIT traynpi01   1929  7 540
## 25    NY1 muelldo01   1956  7 453
## 26    NY1 connoro01   1885  8 455
## 27    IN3 glassja01   1887  8 483
## 28    NY1 glassja01   1890  8 512
## 29    PIT donovpa01   1893  8 499
## 30    CHN dungasa01   1893  8 465
## 31    LS3 pinknge01   1893  8 446
## 32    BLN brodist01   1894  8 573
## 33    SLN quinnjo02   1894  8 405
## 34    PIT bierblo01   1895  8 466
## 35    PHI crossla01   1895  8 535
## 36    CIN   hoydu01   1895  8 429
## 37    CIN roushed01   1921  8 418
## 38    CHA collied01   1923  8 505
## 39    CHA collied01   1925  8 425
## 40    WS1 speaktr01   1927  8 523
## 41    PHA cochrmi01   1929  8 514
## 42    NYA seweljo01   1931  8 484
## 43    PIT wanerll01   1933  8 500
## 44    PHI verbaem01   1947  8 540
## 45    CHN ansonca01   1883  9 413
## 46    DTN broutda01   1887  9 500
## 47    WAS   hoydu01   1893  9 564
## 48    PIT bierblo01   1894  9 525
## 49    BLN broutda01   1894  9 525
## 50    SLN milledo01   1894  9 481
## 51    BLN keelewi01   1896  9 544
## 52    CL4 mckeaed01   1896  9 571
## 53    CLE speaktr01   1918  9 471
## 54    CHN  dealch01   1921  9 422
## 55    BOS mcinnst01   1921  9 584
## 56    SLA severha01   1921  9 472
## 57    BSN  highan01   1926  9 476
## 58    CLE summaho01   1926  9 581
## 59    CLE seweljo01   1928  9 588
## 60    WS1  ricesa01   1929  9 616
## 61    NY1 leachfr01   1931  9 515
## 62    PHA busched01   1945  9 416
## 63    BSN holmeto01   1945  9 636
## 64    CLE boudrlo01   1948  9 560
## 65    CLE mitchda01   1952  9 511

Searching for all information of Specific player

This line of code is an example of specifically queing for the information about a certain player. You must find what the specific player’s ID is so you can search specifically for that person’s information if you search and group information based on the player’s ID

query<-"SELECT playerID,sum(HR) FROM Batting
WHERE playerID='ruthba01'
GROUP BY playerID"
sqldf(query)
##   playerID sum(HR)
## 1 ruthba01     714

Filtering with Grouping versus HAving

This is a very important mechanic when it comes to searching for data. When you filter specific information, you use the term WHERE if you filter before grouping data.If you are grouping AFTER you group information, then you use the HAVING command to filter information. This also uses a relatively new technique of gathering the sum of one of the results. In this, they sum the total number of HR’s and groups it by ID’s of each player

#Filter before group is where, and filter after group uses having
query<-"SELECT playerID,sum(HR) FROM Batting
GROUP BY playerID
HAVING sum(HR)>= 600
ORDER BY sum(HR) DESC"
sqldf(query)
##    playerID sum(HR)
## 1 bondsba01     762
## 2 aaronha01     755
## 3  ruthba01     714
## 4 rodrial01     696
## 5  mayswi01     660
## 6 griffke02     630
## 7 thomeji01     612
## 8  sosasa01     609

Use of Average function

This query shows the use of the avg function which is used similariliy to the sum function except it gathers the avg instead of the sum of the numbers.

#sum() avg() max() min() are all useable fucntions
query<-"SELECT playerID,avg(HR) FROM Batting
GROUP BY playerID
HAVING avg(HR)>= 30
ORDER BY avg(HR) DESC"
sqldf(query)
##     playerID  avg(HR)
## 1  pujolal01 36.93750
## 2  bondsba01 34.63636
## 3  mcgwima01 34.29412
## 4  kinerra01 33.54545
## 5  aaronha01 32.82609
## 6  bryankr01 32.50000
## 7   ruthba01 32.45455
## 8   sosasa01 32.05263
## 9  cabremi01 31.85714
## 10 belleal01 31.75000
## 11 rodrial01 31.63636
## 12 schmimi01 30.44444
## 13 abreujo02 30.33333

Drawing from multiple index at once

This next series of data queries involve a new mechanic we learned that involves drawing data from multiple different index at one time. In this, one of the issues is where multiple index have the same information, and you must make a comparison to equate the two ID’s. In this case, the playerID is connected in Batting and Master.

#This is us testing how to draw from multiple index at one
#Time and combine responses from these
query<-"SELECT nameFirst, nameLast, teamID, yearID, HR
FROM Batting INNER JOIN Master
On Batting.playerID=Master.playerID
WHERE Batting.playerID='ruthba01'"
sqldf(query)
##    nameFirst nameLast teamID yearID HR
## 1       Babe     Ruth    BOS   1914  0
## 2       Babe     Ruth    BOS   1915  4
## 3       Babe     Ruth    BOS   1916  3
## 4       Babe     Ruth    BOS   1917  2
## 5       Babe     Ruth    BOS   1918 11
## 6       Babe     Ruth    BOS   1919 29
## 7       Babe     Ruth    NYA   1920 54
## 8       Babe     Ruth    NYA   1921 59
## 9       Babe     Ruth    NYA   1922 35
## 10      Babe     Ruth    NYA   1923 41
## 11      Babe     Ruth    NYA   1924 46
## 12      Babe     Ruth    NYA   1925 25
## 13      Babe     Ruth    NYA   1926 47
## 14      Babe     Ruth    NYA   1927 60
## 15      Babe     Ruth    NYA   1928 54
## 16      Babe     Ruth    NYA   1929 46
## 17      Babe     Ruth    NYA   1930 49
## 18      Babe     Ruth    NYA   1931 46
## 19      Babe     Ruth    NYA   1932 41
## 20      Babe     Ruth    NYA   1933 34
## 21      Babe     Ruth    NYA   1934 22
## 22      Babe     Ruth    BSN   1935  6
#Draw from two queries

Drawing from 2 sources and overwriting specific Information

This is another example where the draw from two queries, and in this one they replace the playerID so that the First and Last name are displayed instead

#Drew from two querries and replaced playerID with
#Real name
query<-"SELECT nameFirst, nameLast, teamID, yearID, HR
FROM Batting INNER JOIN Master
On Batting.playerID=Master.playerID
WHERE HR> 50
ORDER BY HR DESC"
sqldf(query)
##    nameFirst  nameLast teamID yearID HR
## 1      Barry     Bonds    SFN   2001 73
## 2       Mark   McGwire    SLN   1998 70
## 3      Sammy      Sosa    CHN   1998 66
## 4       Mark   McGwire    SLN   1999 65
## 5      Sammy      Sosa    CHN   2001 64
## 6      Sammy      Sosa    CHN   1999 63
## 7      Roger     Maris    NYA   1961 61
## 8       Babe      Ruth    NYA   1927 60
## 9       Babe      Ruth    NYA   1921 59
## 10    Jimmie      Foxx    PHA   1932 58
## 11      Hank Greenberg    DET   1938 58
## 12      Ryan    Howard    PHI   2006 58
## 13      Luis  Gonzalez    ARI   2001 57
## 14      Alex Rodriguez    TEX   2002 57
## 15      Hack    Wilson    CHN   1930 56
## 16       Ken   Griffey    SEA   1997 56
## 17       Ken   Griffey    SEA   1998 56
## 18      Babe      Ruth    NYA   1920 54
## 19      Babe      Ruth    NYA   1928 54
## 20     Ralph     Kiner    PIT   1949 54
## 21    Mickey    Mantle    NYA   1961 54
## 22     David     Ortiz    BOS   2006 54
## 23      Alex Rodriguez    NYA   2007 54
## 24      Jose  Bautista    TOR   2010 54
## 25     Chris     Davis    BAL   2013 53
## 26    Mickey    Mantle    NYA   1956 52
## 27    Willie      Mays    SFN   1965 52
## 28    George    Foster    CIN   1977 52
## 29      Mark   McGwire    OAK   1996 52
## 30      Alex Rodriguez    TEX   2001 52
## 31       Jim     Thome    CLE   2002 52
## 32     Ralph     Kiner    PIT   1947 51
## 33    Johnny      Mize    NY1   1947 51
## 34    Willie      Mays    NY1   1955 51
## 35     Cecil   Fielder    DET   1990 51
## 36    Andruw     Jones    ATL   2005 51

Replace multiple datas with new data

This continues to build on what we worked on in the last query, and it will replace both the player and the teams full names

query<-"SELECT playerID,name,Batting.yearID, Batting.HR
FROM Batting INNER JOIN Teams
On Batting.teamID=Teams.teamID and Batting.yearID=Teams.yearID
WHERE playerID='ruthba01'"
sqldf(query)
##    playerID             name yearID HR
## 1  ruthba01   Boston Red Sox   1914  0
## 2  ruthba01   Boston Red Sox   1915  4
## 3  ruthba01   Boston Red Sox   1916  3
## 4  ruthba01   Boston Red Sox   1917  2
## 5  ruthba01   Boston Red Sox   1918 11
## 6  ruthba01   Boston Red Sox   1919 29
## 7  ruthba01 New York Yankees   1920 54
## 8  ruthba01 New York Yankees   1921 59
## 9  ruthba01 New York Yankees   1922 35
## 10 ruthba01 New York Yankees   1923 41
## 11 ruthba01 New York Yankees   1924 46
## 12 ruthba01 New York Yankees   1925 25
## 13 ruthba01 New York Yankees   1926 47
## 14 ruthba01 New York Yankees   1927 60
## 15 ruthba01 New York Yankees   1928 54
## 16 ruthba01 New York Yankees   1929 46
## 17 ruthba01 New York Yankees   1930 49
## 18 ruthba01 New York Yankees   1931 46
## 19 ruthba01 New York Yankees   1932 41
## 20 ruthba01 New York Yankees   1933 34
## 21 ruthba01 New York Yankees   1934 22
## 22 ruthba01    Boston Braves   1935  6

Combining everything We have Learned

This is an extensive combination of most of what we have learned and it combines 3 different index information, and it compares them specifically. In the SELECT portion we specify that the yearID and HR’s will be selected from the Batting index. This will provide all the information about Babe Ruth.

#This gives all the full data of all of Babe Ruth's Seasons with whole
#Names for his teams and himself
query<-"SELECT nameFirst,nameLast,name,Batting.yearID,Batting.HR 
FROM (Batting INNER JOIN Master
On Batting.playerID=Master.playerID) INNER JOIN Teams
On Batting.teamID=Teams.teamID AND Batting.yearID=Teams.yearID
WHERE Batting.playerID ='ruthba01'"
sqldf(query)
##    nameFirst nameLast             name yearID HR
## 1       Babe     Ruth   Boston Red Sox   1914  0
## 2       Babe     Ruth   Boston Red Sox   1915  4
## 3       Babe     Ruth   Boston Red Sox   1916  3
## 4       Babe     Ruth   Boston Red Sox   1917  2
## 5       Babe     Ruth   Boston Red Sox   1918 11
## 6       Babe     Ruth   Boston Red Sox   1919 29
## 7       Babe     Ruth New York Yankees   1920 54
## 8       Babe     Ruth New York Yankees   1921 59
## 9       Babe     Ruth New York Yankees   1922 35
## 10      Babe     Ruth New York Yankees   1923 41
## 11      Babe     Ruth New York Yankees   1924 46
## 12      Babe     Ruth New York Yankees   1925 25
## 13      Babe     Ruth New York Yankees   1926 47
## 14      Babe     Ruth New York Yankees   1927 60
## 15      Babe     Ruth New York Yankees   1928 54
## 16      Babe     Ruth New York Yankees   1929 46
## 17      Babe     Ruth New York Yankees   1930 49
## 18      Babe     Ruth New York Yankees   1931 46
## 19      Babe     Ruth New York Yankees   1932 41
## 20      Babe     Ruth New York Yankees   1933 34
## 21      Babe     Ruth New York Yankees   1934 22
## 22      Babe     Ruth    Boston Braves   1935  6

More final Examples

These next two functions continue to combine everything we have learned. It practices grouping, collecting sums from specific indexes, and ordering the data in different areas that range from number of home runs to the salaries of the players, or even just their first name.

#Finds all people ever with over 600 career homeruns and gives their full name
#and team name
query<-"SELECT nameFirst,nameLast,name,sum(Batting.HR),Batting.yearID 
FROM (Batting INNER JOIN Master
On Batting.playerID=Master.playerID) INNER JOIN Teams 
On Batting.teamID=Teams.teamID AND Batting.yearID=Teams.yearID
GROUP BY Batting.playerID
HAVING sum(Batting.HR)>= 600
ORDER BY sum(Batting.HR) DESC"
sqldf(query)
##   nameFirst  nameLast                 name sum(Batting.HR) yearID
## 1     Barry     Bonds San Francisco Giants             762   2007
## 2      Hank     Aaron    Milwaukee Brewers             755   1976
## 3      Babe      Ruth        Boston Braves             714   1935
## 4      Alex Rodriguez     New York Yankees             696   2016
## 5    Willie      Mays        New York Mets             660   1973
## 6       Ken   Griffey     Seattle Mariners             630   2010
## 7       Jim     Thome    Baltimore Orioles             612   2012
## 8     Sammy      Sosa        Texas Rangers             609   2007
#Finds people with the average salary over 1 million and the name BOB
query<-"SELECT namefirst,nameLast, avg(salary)
FROM Salaries INNER JOIN Master
On Salaries.playerID=Master.playerID
WHERE nameFirst='Bob'
GROUP BY Salaries.playerID
HAVING avg(salary) > 1000000
ORDER BY avg(salary) DESC" 
sqldf(query)
##   nameFirst  nameLast avg(salary)
## 1       Bob   Wickman     2635813
## 2       Bob     Welch     1930417
## 3       Bob     Howry     1804583
## 4       Bob    Horner     1416667
## 5       Bob Tewksbury     1288182
## 6       Bob   Stanley     1067500
## 7       Bob     Boone     1047580
## 8       Bob     Ojeda     1035926